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Abstract 

Reasoning  about  a  program  by  treating  program  variables  as  sets  of  “values”  leads  to  a  simple, 
accurate  and  intuitively  appealing  notion  of  program  approximation.  This  paper  presents  such 
an  approach  for  the  compile-time  analysis  of  ML  programs.  To  develop  the  core  ideas  of  the 
analysis,  we  consider  a  simple  untyped  call-by-value  functional  language.  Starting  with  an 
operational  semantics  for  the  language,  we  develop  an  approximate  “set  based”  operational 
semantics  which  formalizes  the  intuition  of  treating  program  variables  as  sets.  The  key  result  of 
the  paper  is  an  0(n3)  algorithm  for  computing  the  set  based  approximation  of  a  program.  We 
then  show  how  the  analysis  can  be  extended  in  a  natural  way  to  deal  with  arrays,  arithmetic, 
exceptions  and  continuations.  We  also  briefly  describe  results  from  an  implementation  of  this 
analysis  for  ML  programs. 
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1  Introduction 


The  motivation  of  this  paper  is  the  development  of  a  compile-time  analysis  of  functional  programs 
that  combines  the  following  properties: 


•  It  provides  relatively  accurate  information  about  a  program,  with  particular  emphasis  on 
the  program’s  data  structures. 

•  The  underlying  notion  of  approximation  has  a  simple  uniform  definition,  and  the  results 
of  the  analysis  are  intuitive  and  predictable. 

•  It  is  flexible  and  easily  modified  to  incorporate  arithmetic  and  operations  such  as  assign¬ 
ment,  callcc  and  exception  handling. 

•  It  is  practical. 

lb  this  end,  we  consider  an  approach  to  program  analysis  based  solely  on  the  idea  of  ignoring 
inter- variable  dependencies.  To  illustrate  what  is  n  eant  by  i  nter- variable  dependencies,  consider 
the  following  two  ML  programs. 


let  fun  mkJist  (u,  v)  *  [u,  v] 
in 

mkJist(1 ,2); 
mk_)ist(3,4) 

end 

Program  1 


let  tan  append(x  ::  xs,  y)  =  x  ::  append(xs,  y) 
|  append(nil,  y’)  =  y’ 
run  rev(z  ::  zs)  =  append  (rev  zs,  [z]) 

|  rev  nil  *  nil 
in  rev  [1 ,2,3,4] 
end 

Program  2 


During  the  execution  of  Program  1,  the  body  of  the  function  mkJist  is  executed  in  two  envi¬ 
ronments:  [u  1 ,  v  •-+  2]  and  [  u  *->  3,  v  »-►  4].  Inter-variable  dependencies  arise  here  in  the 
sense  that  the  variable  u  takes  the  value  1  exactly  when  v  takes  2,  and  u  takes  3  when  v  takes 
4.  In  general,  we  say  that  inter-variable  dependencies  arise  whenever  the  set  of  environments 
encountered  at  some  program  point  is  such  that  fixing  a  value  for  one  or  more  variables  restricts 
the  possible  values  of  the  other  variables. 


Such  dependencies  may  be  ignored  by  treating  the  program  variables  as  denoting  sets  of 
values  instead  of  individual  values.  In  Program  I,  the  sets  for  u  and  v  are  {1,  3}  and  {2,  4} 
respectively.  If  program  variables  are  treated  as  sets,  then  the  result  of  Program  1  is  approximated 
by  the  set  of  values  {[1 ,2],  [1 ,4],  [3,2],  [3,4]},  in  contrast  to  its  actual  result  which  is  the  single 
value  [3,4]. 


In  Program  2,  dependencies  arise  between  the  variables  x,  xs  and  y  in  the  function  append, 
and  between  z  and  zs  in  the  function  rev.  If  the  values  of  variables  are  collected  into  sets,  then 
we  obtain  the  set  {1 ,2,3,4}  for  both  x  and  z.  Using  this  information,  a  set  based  interpretation 
of  the  program  can  be  developed  as  follows.  Consider  the  definition  of  append.  From  the  first 
clause,  we  see  that  the  values  returned  by  append  include  values  1  ::  l,  2  ::  l,  3::  l  and  4 
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::  l  where  l  is  some  list  returned  by  append.  From  the  second  clause,  the  values  returned  by 
append  include  any  value  of  y,  and  noting  the  call  append(rev  zs,  [z])  in  the  definition  of  rev, 
these  values  include  the  singleton  lists  [1  ],  [2],  [3]  and  [4].  Combining  these  two  observations,  it 
is  easy  to  see  that  the  set  based  interpretation  of  Program  2  yields  the  set  of  all  lists  constructed 
from  1,2, 3  and  4. 

The  notion  of  set  based  approximation  can  be  extended  in  a  variety  of  ways.  For  example, 
consider  programs  involving  arrays.  In  keeping  with  the  methodology  of  ignoring  dependencies, 
we  shall  ignore  the  dependencies  between  subscripts  and  array  values.  In  essence,  we  treat  an 
array  as  a  set  of  values.  When  the  array  is  updated,  a  new  value  is  added  to  this  set.  When 
the  array  is  subscripted,  the  whole  set  is  returned.  For  example,  the  set  based  approximation 
of  Program  3  yields  the  set  of  all  values  obtained  by  summing  any  number  of  3’s  and  4’s  i.e. 
{3n  +  4m  :  m  >  0,n  >  0,m  +  n  >  1}. 

let  fun  cum  (arr :  int  array)  > 
let  fun  f  0  *  arr  sub  0 

|  f  i  =  (arr  sub  i)  +f(i  -  1)  fun  map  f  (x  ::  I)  =  (( x) ::  (map  f  I) 

in  |  map  f  nil  =  nil 

f  ((length  arr)  -  1) 

end  valt  =  [1,2,3] 

val  arr  *  array(10, 3)  val  d  =  dynamic 

in  val  u  *  map  (fn  x  =>  (x,  d))  I 

update(arr,  6, 4);  val  v  -  map  (fn  (x,  y)  =>  x)  u 

cum  arr  valw>  map  (fn  (x,  y)  =>  y)  u 

end 

Program  3  Program  4 

Set  based  approximation  can  also  be  extended  to  deal  with  non-standard  values.  For  example, 
to  perform  a  binding  time  analysis  [5, 13, 17],  a  non-standard  value  dynamic  is  introduced  to 
represent  a  value  that  will  not  be  known  until  Mrun-time”.  To  illustrate  this,  consider  Program 
4.  The  set  based  approximation  of  this  program  yields  the  following  information1  about  the 
variables  u,  v  and  w:  u  is  a  list  of  pairs  whose  first  element  is  either  1 , 2  or  3  and  whose  second 
argument  is  dynamic;  visa  list  of  1  ’s,  2’s  and  3’s,  and  w  is  a  list  of  dynamic’s. 

lb  summarize,  the  analysis  developed  here  is  based  on  the  notion  of  ignoring  inter-variable 
dependencies  by  treating  variables  as  sets.  In  other  words,  the  environments  encountered  at  each 
point  in  a  program  are  collapsed  into  a  single  set  environment  (mapping  from  variables  into  sets). 
Strictly  speaking,  there  are  three  kinds  of  dependencies  that  are  ignored  in  set  based  analysis. 
First,  dependencies  between  different  variables  are  ignored  -  this  was  illustrated  by  Program 
1.  Second,  dependencies  between  different  occurrences  of  the  same  variable  are  ignored.  For 
example  the  approximation  of  Program  5  yields  {[1,1],  [1 ,2],  [2,1],  [2,2]}  and  not  {[1 ,1],  [2,2]}. 
Third,  dependencies  between  the  domain  and  codomain  of  functions  are  ignored.  For  example 
the  approximation  of  Program  6  yields  {2,3}  and  not  {3}. 

1  We  note  that  this  information  ii  in  fact  obtained  only  if  (be  analyst*  of  map  is  “polyvariant”  (that  is,  provided  there  can  be 
different  “vers ions”  of  map).  We  refer  to  Ibis  issue  later  in  the  paper. 
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let  fua  f  x  *  [x,x] 

la 

H; 

12; 

end 

Program  5 


let  fun  g  1  ■  2 
I  g2  =  3 
in 

9  2; 
end 

Program  6 


In  this  paper  we  do  two  things.  First,  we  develop  the  underlying  ideas  of  set  based  analysis. 
We  give  a  simple  and  natural  formalization  of  the  notion  of  set  based  approximation,  and 
then  present  an  algorithm  (based  on  constructed  and  solving  set  constraints)  for  computing  this 
approximation.  This  is  carried  out  in  the  context  of  a  small  untyped  call-by-value  functional 
language  that  *  j  intended  to  be  suggest!  ve  of  a  number  of  aspects  of  ML  [15].  Second,  we  describe 
an  implementation  for  the  set  based  analysis  of  ML  programs,  which  extends  the  basic  notions 
of  set  based  analysis  to  arithmetic,  arrays,  continuations  and  exceptions.  This  implementation  is 
built  on  the  lambda  intermediate  representation  of  the  SML/NJ  compiler  [3].  Typical  execution 
times  are  from  less  than  a  second  for  small  programs,  to  a  couple  of  seconds  for  moderate  sized 
programs  of  the  order  of  1000  lines2.  The  implementation  subsumes  and  generalizes  aspects  of 
type  analysis,  safety  analysis,  control  flow  analysis,  structure  sharing  and  usage  analysis,  interval 
analysis,  and  binding  time  analysis.  Applications  of  the  implementation  include  improved  code 
generation  (the  information  obtained  can  be  used  to  guide  data-structure  representation  and 
in-lining,  and  remove  redundant  tests  such  as  array  bounds  checking),  partial  evaluation  and 
checking  static  program  properties  (for  example,  better  checking  of  non-exhaustive  pattern 
matching). 


Related  Literature 


The  idea  of  defining  program  approximation  by  treating  program  variables  as  set  of  values  has 
been  used  previously  in  the  analysis  of  logic  programs  and  imperative  programs  by  Jaffar  and 
the  present  author  [7,  8,  9].  In  the  context  of  functional  programs,  work  on  type  inference  by 
Mishra  and  Reddy  [16]  is  similar  in  spirit,  although  in  [  16]  substantial  restrictions  are  placed  on 
types  (for  example,  they  must  be  “tuple-distributive”). 

More  closely  related  is  work  by  Aiken  and  Wimmers  [2],  who  extract  type  constraints  from 
a  program  and  provide  a  normalization  procedure  for  solving  these  constraints  over  the  domain 
of  downward  closed  sets  of  finite  elements  (essentially  the  “ideal”  model  of  types).  However, 
the  constraints  used  and  their  simplification  algorithm  are  very  different  from  those  used  in 
the  present  paper.  Constraints  have  also  been  used  in  binding  time  analysis  [11]  and  safety 
analysis  [18].  In  the  former,  the  program  approximation  that  arises  is  different  from  set  based 
approximation  (and  in  fact  less  accurate),  but  can  be  computed  in  almost-linear  time.  In  the  latter 
(which  is  based  on  closure  analysis),  the  constraint  are  solved  over  subsets  of  a  finite  domain  of 
“closures”.  In  contrast,  our  constraints  are  solved  over  an  infinite  domain. 

2It  alto  mu  al]  of  the  example*  given  ia  thin  aectioo. 
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Perhaps  the  most  closely  related  work  is  by  Jones  [12],  where  a  grammar  approach  is 
presented  to  the  analysis  of  lazy  higher-order  functional  programs.  The  main  aspect  of  our 
work  that  sets  it  apart  from  other  works  is  that  we  start  with  a  simple,  intuitive  definition  of 
approximate  semantics  based  on  an  operational  semantics,  and  only  then  present  algorithms 
(using  constraints)  that  correspond  exactly  with  this  approximation.  Moreover,  we  extend  this 
analysis  to  deal  with  side  effects  and  continuations  in  a  uniform  and  intuitive  manner. 


2  Set  Based  Approximation 


Consider  a  simple  call-by-value  functional  language  whose  terms  e  are  defined  by 

e  ::=  x  |  c(elt...,en)  |  Xx.e  \  et  e2  |  case(et ,  c(xj,...,xn)  =>  e2,  y  =>  e 3)  \fixx.e 

where  x  and  y  range  over  program  variables  and  c  ranges  over  a  given  set  of  (varying  arity, 
“first-order”)  constants.  It  is  convenient  to  adopt  the  usual  convention  that  each  bound  variable 
is  distinct  The  operator  case  is  essentially  a  very  restricted  form  of  the  ML  case  expression  and 
provides  a  mechanism  for  branching  on  the  result  of  a  computation  as  well  as  “deconstructing” 
values.  The  operator  fix  serves  to  express  recursion3. 

The  operational  semantics  for  the  evaluation  of  the  closed  terms  of  this  language  is  given  in 
Figure  1.  The  variables  E  and  v  range  over  environments  and  values  respectively,  and  these  are 
defined  mutually  recursively  as  follows.  An  environment  E  is  a  finite  mapping  from  program 
variables  into  binding  values.  A  binding  value  is  either  a  value  or  an  expression  of  the  form 
fix  x.e.  A  value  v  is  of  the  form  c(vi, .  . . , v„)  where  the  Vi  are  values,  or  a  closure  of  the  form 
(E,  Xx.e)  where  E  is  an  environment  If  E  is  an  environment  then  we  write  dom(  E)  to  denote  the 
(finite)  set  of  variables  on  which  E  is  defined.  The  notation  E[x*-*exp]  denotes  the  environment 
which  maps  x  into  exp  and  all  other  variables  x'  into  E(x').  We  write  e  -*  v  if  E  e  v 
when  E  is  the  empty  environment 

We  now  modify  the  operational  semantics  so  that  dependencies  between  variables  are  ig¬ 
nored.  This  is  achieved  by  treating  program  variables  as  sets  of  values.  To  formalize  this,  first 
define  that  a  set  environment  £  is  a  finite  mapping  from  variables  into  sets  of  binding  values. 
We  write  £  €  £  to  denote  that  E(x)  €  £(x)  for  all  x  €  dom(E).  The  set  based  operational 
semantics,  presented  in  Figure  2,  is  essentially  obtained  by  replacing  environments  E  in  the 
rules  of  Figure  1  by  set  environments4  £'.  This  replacement  necessitates  two  kinds  of  changes 
to  the  rules.  First,  the  two  variable  rules  var-1  and  var-2  are  modified  to  accommodate  the  fact 
that  £(x)  is  a  set  Second,  the  rules  that  involve  variable  binding  (app,  case-1,  case-2  and  fix) 
are  modified  so  that  the  binding  information  is  dropped. 

Observe  that  this  second  group  of  rules  will,  in  general,  lead  to  an  unsound  approximation. 

3In  fix  x.e,  the  expression  e  dull  typically  be  an  abstraction. 

4  We  remark  that  ooe  reason  for  the  explicit  use  of  environments  in  the  operational  semantics  in  Figure  1  is  precisely  to  enhance 
this  intuition.  However,  the  notion  of  set  based  approximation  is  not  limited  to  this  style  of  semantics.  Analogous  definitions  can 
be  made  starting  from  an  operational  semantics  that  uses  substitution. 
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E  I-  x  -*  E(x)  (E(x)  fix  y.e) 

(VAR-1) 

E  F  fix  y.e  -+  v  . 

EI-m-.,  <£(*)  =  /&».«) 

(var-2) 

E  h  e i  -*  (E\ Xx.e)  E  F  e-i~*  v'  E'[x>-*vr)  h  e-+» 

(APP) 

E  F  t\  t2  — ►  v 

E  e i  -»  vit  i  =  l..n 

E  F  c(ej, ...  ,en)  — »  c(t>i,. . .  ,vn) 

(CONST) 

E  F  Xx.e -+ (E,  Xx.e) 

(ABS) 

E  F  ei  — »  c(»i,...,t;n)  F  e2  -*•  v 

(CASE-1) 

E  F  case(eu  c(z i,...,zn)  =►  e2,  y  =F  e3)  -►  v 

E  F  el  -*■  </(«!,...,  t>w)  £[y'-*c/(i;i,...,Bn)]  F  e3  ->  v 

(c  ^  c')  (case-2) 

E  F  case(ei,  c(zi,...,z„)  =>  e2,  y  =»  e3)  -*■  v 

£7[z>-*7iz  z.e]  F  e  — >  v 

E  F  fixx.e  -*  v 

(FIX) 

Figure  1:  Operational  Semantics  for  the  Simple  Language. 
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£  F  x  v  (v  €  S(x),  v  ^  fix  y.e) 

(VAR-1) 

£  H  fix  y.e  v  .  ' 

(var-2) 

h  (E, Xx.e)  £  \-  e2~~*  v'  £  \-  e^v 

£  1-  e\  e2  v 

(APP) 

£  1-  e\  v,-,  *  =  l..n 
£  h  c(ci,...tcn)  c(vt|. • . i®») 

(CONST) 

£  1-  Ai.e-v*  (E,\x.e)  (E  €  £) 

(ABS) 

£  h  ej  ^  c( ,t>n)  £  f-  e2~~*v 

£  h  otwe(ei,  e(x1,...,*n)  =»  e2,  y  =»  e3) -n>  » 

(CASE-1) 

£  F  «i -n*  cVi,...,**)  £\-t}-*v  {c  f  d) 

£  h  case(ci,  c(x i, ...,*„)=*•  e2,  y  =»  ea) v 

(CASE-2) 

£  t-  e  v 
£  1-  fix  x.e  v 

(FIX) 

Figure  2:  Set  Based  Operational  Semantics. 
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That  is,  certain  set  environments  £  will  be  such  that  for  some  closed  terms  eo,  t-  eQ  —>  v  but 
£  !/  eo  v.  We  shall  however  always  ensure  that  whenever  one  of  these  rules  is  applied,  £  is 
“sufficiently  large”  that  it  contains  all  bindings  to  variables.  For  rule  app,  the  required  condition 
on  £  is  v'  e  £(x).  Similar  conditions  can  be  given  for  the  other  rules  involving  binding.  Note 
that  it  is  not  appropriate  to  just  add  these  conditions  as  side  conditions  to  the  respective  rules, 
since  this  would  have  the  effect  of  reducing  the  number  of  derivations.  Instead,  we  require  that 
whenever  one  of  the  potentially  unsafe  rules  is  applied,  the  extra  conditions  are  always  met.  To 
formalize  this,  define  that  £  is  safe  with  respect  to  a  closed  term  e0  if,  for  every  derivation  ending 
in  £  h  e0  u,  the  following  four  conditions  are  met  (we  follow  the  notation  established  in 
Figure  2): 

1.  Every  application  of  the  rule  app  is  such  that  v'  €  £(x). 

2.  Every  application  of  the  rule  case-1  is  such  that  v,  e  £(x,),  i  =  l..n. 
i.  Every  application  of  the  rule  case-2  is  such  that  c'(vt,.. .  ,vn)  €  £(y). 

4.  Every  application  of  the  rule  FIX  is  such  that  fix  x.e  e  £(x). 

Importantly,  safety  implies  soundness  in  the  following  sense: 

Theorem  1  (Soundness)  If  £  is  safe  wrt  a  closed  term  eo,  then  h  eo  — » ►  v  implies  £  I-  eo  v. 

Proof:  The  proof  follows  by  structural  induction  on  the  subderivations  of  the  derivation  he-* 
v.  The  induction  hypothesis  must  be  strengthened  slightly  to  include  a  simple  property  about 
the  closures  that  may  be  encountered.  [] 

In  essence,  this  proves  that  if  we  guess  £  so  that  it  is  safe,  then  the  set  based  operational 
semantics  provides  a  sound  approximation  of  tli<  execution  of  a  term.  However,  given  a  term 
eo,  there  are  many  correct  choices  for  £,  and  these  give  rise  to  different  approximations  of  eo- 
The  following  proposition  implies  that,  given  e0,  there  is  a  canonical  choice  for  £,  and  that  this 
choice  gives  rise  to  the  most  accurate  approximation  of  eo.  First,  define  that  the  intersection 
of  set  environments  £\  and  £j,  denoted  £\  n  £2,  is  given  by:  (£\  n  £2)(x)  d=  £\{x)  n  £2(1), 
provided  £i(x)  and  £2(1)  are  both  defined.  Then: 

Proposition  1  (Minimality)  If  £\  and  £2  are  safe  wrt  a  closed  term  e0,  then  so  is  £\  n  £2. 
Moreover,  £1  D  £2  b  eo  v  implies  £\  I-  eo  v  and  £2  F  eo  v. 

Proof:  The  proof  here  is  straightforward  and  follows  from  the  observation  that  any  derivation 
£1  n  £2  h  eo  v  can  be  replayed  to  give  isomorphic  derivations  £\  h  e0  v  and 
£2  I-  eo  u.  [] 

This  motivates  the  following  definition5. 

5  It  is  possible  to  give  a  direct  definition  of  ret  based  approximation,  which  avoids  the  minimization  over  C.  However  such  a 
definition  is  substantially  more  complex. 
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Definition  1  (Set  Based  Approximation)  Let  eo  be  a  closed  term.  Let  £mm  be  the  least  set 
environment  that  is  safe  wrt  eo-  The  set  based  approximation  of  «o.  denoted  sba(e0),  is  defined 

by: 


sba(e o)  =  {t> :  €min  b  «o  —  v}  [j 

lb  summarize,  the  set  based  operational  semantics  approximates  the  execution  of  a  term  by 
collapsing  all  environments  into  one  single  set  environment.  No  other  form  of  approximation  is 
employed.  In  particular,  no  use  is  made  of  abstract  domains  (such  as  those  commonly  employed 
in  abstract-interpretation  styles  of  program  analysis  [6]).  We  remark  that  the  results  of  the 
analysis  are  typically  infinite  sets  of  values,  and  that  we  make  no  a  priori  requirement  that  these 
sets  be  finitely  presentable. 


3  Main  Result 


We  now  present  the  main  result  of  the  paper,  which  is  an  algorithm  for  computing  sba(e0) 
for  any  closed  term  eo.  The  structure  of  the  algorithm  is  as  follows.  First,  we  construct  set 
constraints  corresponding  to  the  input  term  eo.  In  essence,  these  constraints  express  relationships 
between  sets  of  values  in  such  a  way  that  a  model  of  the  constraints  corresponds  to  the  set  based 
execution  of  e0  in  some  safe  set  environment  £.  Importantly,  the  least  model  of  these  constraints 
corresponds  to  execution  in  the  smallest  safe  set  environment,  and  hence  to  sba(e0).  The 
second  part  of  the  algorithm  is  a  simplification  procedure  for  set  constraints.  In  essence,  this 
algorithm  constructs  an  explicit  representation  of  the  least  model  of  the  input  set  constraints. 
This  representation  is  in  the  form  of  a  regular  tree  grammar.  Note  that  no  assumptions  have 
been  made  about  the  adequacy  of  regular  tree  grammars.  The  fact  that  the  least  model  of  the  set 
constraints  (and  hence  sba(eo))  can  be  represented  using  regular  tree  grammars  is  a  by-product 
of  the  correctness  proof  of  the  algorithm. 

Before  describing  the  form  of  the  set  constraints  employed  by  the  algorithm,  we  first  note 
that  the  environment  part  of  closures  in  sba(e o)  is  essentially  redundant  In  particular,  if  €  is 
the  least  set  environment  that  is  safe  with  respect  to  a  closed  term  e0,  and  if  sba(e 0)  contains 
a  closure  ( E ,  Ax.e),  then  sba(eo)  must  in  fact  contain  all  closures  of  the  form  ( E' ,  Ax.e)  such 
that  E'  €  €.  This  is  because  the  set  based  operational  semantics  collapses  all  environments 
into  the  single  set  environment  £,  and  moreover,  the  only  closures  generated  during  the  set 
based  execution  are  via  the  (abs)  rule.  In  the  computation  of  sba(e0),  it  is  convenient  to  drop 
the  redundant  environment  information  in  closures6.  More  formally,  define  an  operator  ||v||  on 
values  v,  which  forgets  the  environment  part  of  closures,  as  follows: 

{c  if  v  is  the  constant  c 

INI  |N|  If  V  iS  02 
Ax.e  if  v  is  (E,Ax.e) 

*Ws  note  Hut  this  call  be  recovered  if  seeded,  although  it  is  not  completely  trivial  to  do  so  since  values  and  environments  are 
mutually  dependent 
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The  algorithm  presented  in  this  section  computes  a  representation  (using  regular  tree  grammars) 
of  theset  ||«4o(eo)||  =  {IMI :  v  €  «&a(e0)}. 


Set  Constraints 


The  use  of  set  constraints  for  analysis  of  programs  dates  back  to  the  early  works  by  Reynolds 
[  19],  and  Jones  and  Muchnick  [  14],  which  employ  constraints  involving  projection.  The  calculus 
of  set  constraints  was  first  defined  and  studied  in  a  general  setting  by  Jaffar  and  the  present 
author  [9].  [9]  also  contained  a  decision  procedure  for  a  class  of  set  constraints  involving 
projection  and  intersection.  Later  works  have  provided  algorithms  for  different  classes  of  set 
constraints  (Aiken  and  Murphy  [1]  have  dealt  with  complementation  and  projection;  Jaffar  and 
the  present  author  have  dealt  with  set  constraint  operators  that  are  designed  for  analyzing  logic 
programs  and  imperative  programs  [7,  8],  and  combinations  of  set  constraint  techniques  and 
abstract  interpretation  techniques  [  10]),  as  well  as  providing  alternative  proofs  of  previous  results 
(Bachmair.  Ganzinger  and  Waldmann  [4]  establish  a  connection  between  certain  kinds  of  set 
constraints  and  a  fragment  of  logic  shown  decidable  by  Ldwenheim,  and  in  the  process  provide 
alternative  proofs  of  the  earlier  results  in  [  1  ]  and  [9]). 

We  extend  the  basic  set  constraint  calculus  of  [9]  by  adding  operations  to  model  function 
application  and  case  statements.  The  form  and  meaning  of  these  constraints  is  defined  in  the 
context  of  some  given  closed  term  eo-  We  assume  a  fixed  infinite  class  of  set  variables-,  set 
variables  shall  be  denoted  W,  X,  y,  Z.  We  distinguish  two  special  disjoint  subclasses  of  set 
variables.  First,  for  each  program  variable  x  in  eo,  there  is  a  distinct  set  variable  Xx  which  shall 
be  used  to  capture  all  of  the  values  for  the  program  variable  x.  Second,  for  each  abstraction 
\x.e  appearing  in  e0,  there  is  a  distinct  set  variable  ron(  Ax.e),  the  “range”  of  Ax.e,  which  shall 
be  used  to  capture  all  of  the  values  returned  by  applications  of  Xx.e  during  execution.  Now, 
in  the  context  of  the  given  term  eo,  we  define  that  a  set  expression  (se)  is  either  a  set  variable, 
an  abstraction  Xx.e  that  appears  in  eo,  or  of  one  of  the  forms  c(sei , . . . ,  sei),  apply(se\ ,  se2), 
case(sei,c(X\,...,Xn)  =>  se2  ,y  =>  aej)  or  i/nonempfy(  sei,se2)  (which  shall  be  used  later). 
The  first  form  is  used  to  model  execution  of  expressions  c(e», . . . ,  en).  the  second  form  models 
application,  the  third  is  for  case  statements,  and  the  last  is  used  to  reason  about  emptiness.  A  set 
constraint  is  an  expression  of  the  form  X  D  se,  and  a  conjunction  C  of  set  constraints  is  a  finite 
collection  of  set  constraints. 

We  now  define  the  meaning  of  the  set  constraints.  In  essence,  set  expressions  shall  be 
interpreted  as  sets  cc  values  with  the  environment  component  of  closures  removed.  Specifically, 
a  set  constraint  value  (sc-value)  is  either  an  abstraction  Xx.e  that  appears  in  e0,  or  of  the  form 
c(vj, .  where  each  v,  is  an  sc-value.  An  interpretation  is  a  mapping  from  each  set  variable 
into  a  set  of  sc-values.  Such  an  interpretation  is  extended  to  map  set  expressions  to  sets  of 
sc-values  as  follows: 

1.  I(c(seu...,sen))  =  {c(vu...,vn) :  Vi  6  !(«:<), i  =  l..n}; 

2.  2(Xx.e)  =  {Ax.e}; 
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3-  l(ifnonempty(sti,se2))  —  if l(se\)  =  {}  then  {}  else  l(sei)\ 

4.  I(apply(se\,sc2))  =  :  Xx.e  6  I{se i)  A  I(se 2)  ^  {}  A  t>  €  I(ran(Xx.e))  } 

provided  Xx.e  €  I(set)  implies 1(3*2)  £  Z(Xx) 

5.  I(ctwe(jei,c(A’i,...,A'B)  ^  3€2,3;  ^ -*€3))  =  5i  U  52, 

where  (i)  S\  =  {v  :  t?  €  1(3*2)  A  3e(t>i,... ,  vn)  €  l(sei)} 

( it )  S2  =  {v  :  v  €  l{se3)  A  3c/(t>i,...,»n)  €  l(sei)  s.t.  c'  ^  c} 

(iii)  c(v\,...,vn)  €  I(j«i)  implies  Vj  6  !(«%', ),  i=  l..n 

(*«)  </(«!,... ,t>n)  6  I(seO  where c7  ^  c  implies  c'(vi,...,t>n)  €  1(30 


Note  that  the  above  interpretation  of  set  expressions  is  somewhat  unusual,  because  in  parts  4  and 
5  of  the  definition,  the  set  expressions  themselves  impose  restrictions  on  1.  If  these  conditions 
are  not  met,  then  the  interpretation  of  the  expression  is  undefined.  An  interpretation  1  is  a  model 
of  aconjunction  of  constraints^  if,  for  each  constraint  X  D  se,  it  is  the  case  that  l(se)  is  defined 
and  1(X)  D  I{»e).  It  is  easy  to  verify  a  model  intersection  property  for  the  set  constraints  used 
in  this  paper,  and  it  follows  that  a  conjunction  C  of  constraints  possesses  a  least  model,  denoted 
lm(C),  where  models  are  ordered  as  follows:  1\  2  I2  if  Ii(/V)  2  for  all  set  variables 

X. 


Constructing  Set  Constraints 

The  construction  of  set  constraints  from  a  term  is  described  in  Figure  3.  (Strictly  speaking,  this 
is  a  somewhat  simplified  version  -  the  complete  version  appears  in  Appendix  I.)  In  the  rules 
(app),  (const),  (abs)  and  (case),  the  variable  y  is  intended  to  be  a  new  set  variable  that  is  not 
used  in  any  other  part  of  the  derivation.  Using  these  rules,  we  define 

Definition  2  Let  eo  be  a  closed  term,  then  5C(e 0)  is  the  pair  (X,C)  such  that  t>  eo  :  {XX)- 

D 


lb  illustrate  the  construction  of  the  constraints,  consider  the  term  eo  =  ei  e2  where  et  is 
A f.c(f  a,  f  6),  «2  is  Xx.x  and  a,  b  and  c  are  constants7.  For  this  term,  we  derive  t>  eo  :  (XUC) 
where  C  consists  of  the  constraints 


2  apply (X2,X3) 
X2  2  «i 
Xj  D  €2 


X*  2  apply {Xf, a) 
X,  2  apply (Xj,b) 


ran(ei)  2  c(A 4,  A*) 
ran(e 2)  2  Xx 


In  lm{C ),  Xi  =  ron(ei)  =  {c(o,6),c(6,o),c(a,a),c(6,fc)},  X2  =  {ei},  X3  =  {e2},  X4  = 
X$  =  ran(e2)  =  Xx  =  {a,  6} 

7Wb  iban  wrte  a  u  aa  abbreviation  of  a(). 
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>  X  :  (*«,  {}) 


(VAR) 


t>  «i  :  (Aj,  Ct)  e2  :  (X2,  C2) 

>  ei  e2  :  (y,{y  D  app/y(Xi,X2)}UC,  UC2) 

(APP) 

>  e,  :  (A<,Ci).  *  =  1  n 

l>  c(e,,. :  (y,{y  2  c(A',,...,A'n)}  JCX  U...UCn) 

(CONST) 

t>  e  :  (X,  C) 

>  Az.e  :  (y,  {y  D  Az.e,  ron(Az.e)  D  X)  U  C) 

(ABS) 

>  ei  :  (Z\,  C\)  C>  e2  :  ( Z2 ,  C2)  t>  e2 :  (Z3,  C3 ) 

C>  aue(el,c(xi,...,xn)=>e2,y=>e3)  :  ( y,C  U C\  UC2UC3) 

(CASE) 

where C  =  {y  D  case(Z\,c(XSi, . . .  ,XXn)  =>  Z2,  Xv  =>  Z3)} 


>e:(X,  C) 

>jiii.e:(4{A,I2^}uC) 


(FIX) 


Figure  3:  Construction  of  Set  Constraints  (simplified  version) 


For  presentational  simplicity,  the  constraint  construction  given  in  Figure  3  is  not  in  com¬ 
plete  correspondence  with  sba(e o).  To  see  this,  consider  the  term  eo  =  ei  e2  where  e\  is 
A/.((Au./  a)(Aw./  6))  and  e2  is  Az.z.  The  least  £  that  is  safe  with  respect  to  e0  maps  /  into 
{Az.z},  u  into  {Aw./  6}  and  x  into  {a},  and  sba(e0)  is  {a}.  However,  the  set  constraint  con¬ 
struction  procedure  traverses  all  subexpression  of  eo-  Hence  SC(e o)  contains  the  set  expressions 
apply(Xj,a)  and  apply(X/,b).  As  a  result,  Xx  must  contain  both  a  and  6,  and  so  the  execution 
of  e0  is  approximated  by  {a,  &}.  The  problem  is  that  the  term  Aw./  b  is  never  “executed”  under 
the  set  based  semantics,  but  is  traversed  by  the  set  constraint  construction  process,  lb  rectify  this 
situation,  the  constraint  construction  must  be  such  that  if  Az.e  appears  in  eo,  then  the  constraints 
constructed  for  e  are  vacuously  satisfied  whenever  Xx  (the  set  of  values  for  x)  is  empty.  The 
complete  constraint  construction  procedure  appears  in  Appendix  I.  The  correspondence  between 
sba(eo)  and  5C(eo)  is  given  by  the  following  Lemma8: 

Lemma  1  Let  eo  be  a  closed  term,  let  SC(e o)  be  (X,C)  and  let  I/m  =  lm(C).  Then  lim(X)  = 
\\sba[e0)\\. 

Proof  Sketch:  The  proof  is  fairly  lengthly  and  consists  of  two  main  parts.  The  first  part  involves 
modifying  the  definition  of  £  l~  e  — *  v  so  that  environments  are  removed  from  closures.  Call 
this  new  system  F'.  The  proof  for  this  part  involves  showing  a  correspondence  between  I-  and 
K  The  second  part  then  relates  1-'  with  SC(e0)  by  showing  two  relationships:  (a)  if  £  is  the 
least  set  environment  that  is  safe  wrt  eo  (in  F'),  then  £  can  be  used  to  define  a  model  I  of  C  such 
that£  F'  e  viffv  €  X(A’);  and  (b)  if  I  is  a  model  of  C  then  we  can  define  an  £  that  is  safe  wrt 
e0  such  that  if  £  F'  e  -v*  v  then  v  €  I(X).  In  essence,  part  (a)  shows  that  J/m(  X)  C  ||s6a(e0)||, 
and  part  (b)  shows  that  lim(X)  D  lkMeo)||.  Q 

*We  note  that  Lemma  1  bold*  wing  the  constraint  construction  process  described  in  Figure  3  if  the  following  condition  is 
satisfied:  the  least  set  environment  £  that  is  safe  wrt  eo  is  web  chat  £(z)  ^  {}  for  all  x. 
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input  a  collection  C  of  set  constraints; 

repeat 

if  A”  D  apply (X\ ,  X2)  and  X\  D  Xx.e  both  appear  in  C  then 
add  A  2  ran(Xx.e)  toC; 
add  Ax  2  A2toC; 

if  a  2  aw«(yi,c(>v,,...,>vn)  =>  y2,w  =►  y3) 

and  yi  2  c(2i,...,Zn)  both  appear  in  C 
and  lm(explicit{C))(Zi)  /{},*=  l..n,  then 
addA'D^toC; 
add  Wi  2  to  C,  i  =  l..n; 

if  A"  2  co*e(y,,c(>Vi,...,>Vn)  =>  y2tW  =>  ?3) 

and  yi  2  ^(Zi,  •  •  • » 2n)  both  appear in  C.  where  c*  ^  c, 
and  lm(explicit(C))(Zi)  /  {},  t  =  l..n,  then 
«ddADy3toC; 
add  W  2  c'(Zi,...,Zn)toC; 

if  A*  2  i/nonempty (y\ ,  y2)  appears  inC  and  lm(explicit(C))(yi)  /  {}  then 
addAD^toC; 

if  X  D  A"  and  A"  D  ae  both  appear  in  C, 

where  ae  is  atomic  and  not  a  set  variable,  then 
add  A'  D  ae  to  C; 
until  no  stop  changes  C\ 
output  explicit(C)\ 

Figure  4:  Set  Constraint  Simplification  Algorithm 


Set  Constraint  Algorithm 


We  first  address  the  issue  of  the  output  format  of  the  algorithm.  What  we  desire  is  an  explicit 
representation  of  the  least  model  of  the  set  constraints,  and  specifically,  of  sba(ea).  Since  these 
sets  are  typically  infinite,  we  must  deal  with  finite  representations  of  infinite  sets.  What  is 
needed  is  a  representation  from  which  simple  questions  such  as  membership,  emptiness  and 
containment  can  be  directly  determined.  The  representation  we  use  is  based  on  a  restricted  form 
of  set  constraints.  Specifically,  define  that  a  set  expression  is  atomic  if  it  is  either  an  abstraction 
Xx.e  that  appears  in  eo,  a  set  variable,  or  of  the  form  c(aet , . . . ,  ae„)  where  each  ae,  is  atomic. 
A  constraint  is  in  explicit  form  if  it  has  the  form  A  D  ae  where  ae  is  an  atomic  set  expression 
that  is  not  a  set  variable  (ae  may  of  course  contain  set  variables).  A  collection  of  constraints  is  in 
explicit  form  if  each  constraint  therein  is  in  explicit  form.  If  C  is  a  collection  of  constraints,  then 
ezplicit(C)  denotes  the  explicit  form  constraints  of  C.  We  note  that  explicit  form  constraints  can 
be  regarded  as  regular  tree  grammars  by  treating  set  variables  as  non-terminals  and  regarding  a 
constraint  A  2  ae  as  a  production  A  =>  ae. 

The  simplification  algorithm  accepts  as  input  a  collection  of  constraints  (such  as  those 
constructed  for  a  closed  term  eo)  and  outputs  an  explicit  form  collection  of  constraints  that 
has  the  same  least  model  as  the  input  collection.  The  main  part  of  the  algorithm  involves 
exhaustively  applying  a  series  of  simplification  steps,  and  this  serves  to  add  new  explicit  form 
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constraints  so  that  information  about  lm(C)  is  incrementally  transferred  into  the  explicit  part  of 
C.  The  algorithm  terminates  exactly  when  all  information  about  lm(C)  is  present  in  explicit(C). 
The  details  of  the  algorithm  appear  in  Figure  4.  The  phrase  “add  X  D  se  to  C"  is  used 
to  mean  “add  the  constraint  X  D  se  if  it  does  not  already  appear”.  An  expression  of  the 
form  lm(explicit(C))(y)  ^  {}  indicates  a  test  which  can  be  performed  as  follows:  construct 
explicit(C),  and  (using  standard  algorithms),  check  to  see  if  y  is  empty  in  the  least  model  of 
explicit(C)  (analogous  procedures  can  be  found  in  [7, 9]). 

We  note  that  the  correctness  of  the  algorithm  relies  on  the  fact  that  there  are  no  “nested” 
set  expressions.  In  other  words,  if  an  expression  of  the  form  apply(sei,se2)  appears  in  the 
constraints,  then  sei  and  se 2  are  both  set  variables,  and  similarly  for  expressions  involving 
ifnonempty  and  case.  It  is  easy  to  see  that  SC(e0)  satisfies  this  property,  and  it  is  trivial  to  verify 
that  the  algorithm  preserves  this  property.  The  next  lemma  establishes  the  correctness  of  the 
simplification  algorithm,  and,  combined  with  Lemma  1,  proves  Theorem  2. 

Lemma  2  (Correctness  of  Algorithm)  The  algorithm  terminates  on  input  C  and  outputs  ex¬ 
plicit  form  constraints  C'  such  that  lm(C')  =  lm(C). 

Proof  Sketch:  Termination  is  straightforward  to  verify  since  the  algorithm  adds  only  constraints 
of  the  form  X  D  ae  where  both  X  and  at  are  expressions  that  already  appear  in  the  constraints. 
The  main  part  of  the  proof  is  to  establish  that  the  transformation  steps  are  complete  in  the  sense 
that  when  no  further  transformation  steps  can  be  applied,  then  lm(C)  =  lm(explicit(C)).  This 
is  achieved  by  showing  that  when  no  further  transformation  can  be  applied,  the  interpretation 
lm(  explicit(C))  is  in  fact  a  model  of  C.  [] 

Theorem  2  Given  a  closed  term  e0,  there  is  an  0(n3)  algorithm  to  compute  an  explicit  repre¬ 
sentation  (which  is  equivalent  to  a  regular  tree  grammar)  of\\sba(eo)\\. 

Proof:  Let  «SC(e0)  be  ( X,C ).  By  Lemma  1,  lm(C)  maps  X  into  ||sfw(e0)||.  By  Lemma  2, 
the  set  constraint  simplification  algorithm  produces  collection  of  constraints  C'  in  explicit  form 
when  input  with  C.  Moreover,  Im(C')  =  lm(C).  Hence  lm(C')(X)  =  lm(C)(X)  =  ||sfea(eo)||, 
and  so  C  provides  an  explicit  representation  of  jjs6a(e0)||.  The  0(n3)  bound  can  be  established 
as  follows.  First,  the  construction  of  constraints  is  linear  in  the  size  of  e0.  Second,  at  most  n 2 
new  constraints  can  be  added  by  the  simplification  algorithm,  and  the  cost  of  “adding”  each  new 
constraint  (i.e.  determining  what  other  new  constraints  need  to  be  added,  given  this  constraint 
is  added)  can  be  bounded  by  0(n).  [] 

In  addition,  the  algorithm  trivially  has  an  0(n2)  space  bound.  We  remark  that  this  algorithm 
not  only  provides  a  way  to  compute  s6a(e0),  but  it  also  computes  the  least  set  environment  that 
is  safe  wrt  e0. 


4  Arrays,  Continuations,  Exceptions  and  Arithmetic 


Thus  far  we  have  presented  a  formal  development  of  the  core  ideas  of  set  based  analysis.  We 
now  informally  outline  the  extensions  we  have  employed  for  dealing  with  arrays,  exceptions 
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and  continuations.  As  outlined  in  the  introduction,  the  set  based  treatment  of  arrays  ignores 
dependencies  between  subscripts  and  values.  That  is,  an  array  is  treated  as  a  set  of  values  such  that 
when  the  array  is  updated  the  new  value(s)  are  added  to  this  set,  and  when  the  array  is  accessed 
the  set  of  values  is  returned.  More  concretely,  for  each  place  in  the  program  where  an  array 
can  be  generated,  we  introduce  a  special  distinct  constant  ar  with  two  associated  set  variables 
length(ar)  and  contents(ar).  We  also  introduce  two  new  set  expressions,  contentsof  (se) 
and  update  (sex,se2).  In  essence,  the  first  denotes  the  union  of  the  sets  contents(ar)  such 
that  ar  is  an  element  of  se.  The  second  is  either  (i)  the  empty  set  if  either  set  and  se 2  is 
empty,  (ii)  the  singleton  set  containing  the  unit  value  provided  se\  and  se2  are  non-empty  and 
contents(ar)  D  se2  for  all  or  in  sex,  or  (iii)  is  undefined  otherwise.  The  following  rules  are 
suggestive9  of  how  constraints  are  constructed  for  programs  involving  arrays.  In  the  first  rule, 
ar  is  a  new  constant 


_ >  «i  :  Ci)  t>  e2  :  (X2,  C2) _ 

t>  array(ei,e2)  :  O',  O  2  ar<  contents(ar)  O  8ex<length(ar)  D  se2}uCi  UC2) 


(ARRAY) 


_ >  <i  :  Qi,  C\)  >  e2  :  (X2,  C2) _ 

>  «i  sub  e2  :  O.  O  2  contentsof  (Xi)}  U  Ct  U  C2) 


(SUBSCRIPT) 


>  e,  :  (Xi,  Ci),  i  =  1..3 


>  update(ei,c2,e2)  :  O,  O  2  update(X\,X2)}  U  C\  U  C2) 


(UPDATE) 


Continuations  are  also  modeled  by  introducing  a  new  constant  cont  for  each  callcc  appearing 
in  a  program.  Each  new  constant  has  an  associated  set  variable  contents(cont).  In  essence,  this 
records  the  values  that  are  thrown  to  the  continuation.  In  effect  the  constant  cont  passes  into 
the  term  e  a  reference  to  the  program  point  at  which  the  callcc  occurred  (in  fact  it  passes  down 
the  set  variable  corresponding  to  this  point).  The  set  expression  throw(sex ,  se 2)  is  either  (i)  the 
empty  set  provided  that  contents(cont)  D  se2  for  each  cont  in  sej,  or  (ii)  undefined  otherwise. 


_ >  e  :  (X,  C) _ 

t>  callcc x.e  :  (y,  {y  D  X,y  D  contents(cont), Xx  D  cont}  U C) 


(CALLCC) 


_ >  ei  :  (<¥i,  C|)  >  e2  :  (^*2,  C2) _ 

>  throw(ci,e2 )  :  2  throw{X\,X2)}  UCi  \JC2) 


(THROW) 


Exceptions  are  modeled  by  introducing  a  distinct  new  set  variable  ZXC  to  capture  all  of 
the  exceptions  that  are  raised  during  program  execution.  We  note  exceptions  could  be  more 
accurately  treated  by  introducing  a  new  exception  variable  for  each  expression.  This  would 
provide  better  “separation”  of  the  exceptions  raised  by  different  parts  of  a  program,  but  at  the 
cost  of  introducing  more  constraints.  We  are  currently  investigating  this  tradeoff. 

9 In  particular,  they  are  a  simplification  at  the  actual  nilea  in  the  sense  that  Figure  3  simplifies  Figure  S. 
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>  C  :  (*,  C) 

>  raise  e  :  (y,  {£XC  D  Af}uC) 


(RAISE) 


_ >  :  C*).  A)  t>  e2 :  (<*2,  C2) _ 

>  ei  handle (\z.e2) :  (>,  {>>  2  Xx,y  D  X2,Xt  D  SXC)  UC,  UC2) 


(HANDLE) 


We  conclude  by  considering  arithmetic.  In  essence,  we  shall  treat  arithmetic  operations  like 
data  constructors.  For  example,  the  analysis  of  the  power  program  given  below  left  yields  the 
explicit  form  constraints  given  below  right  (where  X  is  the  set  variable  corresponding  to  the 
result  of  the  program,  and  only  solved  form  constraints  relevant  to  X  are  shown). 


let  fun  power(0,  n)«1 

|  power(m, it)*nx  power(m-1 , n) 
in 

power(3, 4) 
end 


X  D4x  X 
X  D  1 


In  other  words,  what  is  obtained  is  a  description  of  how  the  value(s)  in  question  were  computed. 
The  result  for  this  program  should  be  read  as:  the  value  computed  by  the  program  is  either  1  or 
the  result  of  multiplying  1  by  4  an  arbitrary  number  of  times.  We  omit  further  details  for  space 
reasons. 


5  Implementation 


An  implementation  of  set  based  analysis  for  ML  has  been  developed  over  the  last  two  years. 
The  system  is  build  on  top  of  the  SML-NJ  compiler.  Starting  with  the  lambda  intermediate 
representation  of  a  program,  our  system  incrementally  builds  and  solves  corresponding  set 
constraints.  Many  of  the  set  constraints  that  are  generated  are  trivial,  and  so  an  important  part  of 
the  effort  to  make  the  analyzer  efficient  was  directed  at  ensuring  that  such  constraints  are  solved 
“on-the-fly”,  and  are  never  explicitly  generated. 

Space  does  not  permit  us  to  describe  the  treatment  of  arithmetic,  and  in  particular,  the 
treatment  of  if  statements  involving  arithmetic  expressions  and  comparisons.  However,  we  note 
that  early  results  suggest  that  the  current  implementation  deals  adequately  with  these  constructs, 
and  can  be  usefully  employed  to  address  issues  such  as  the  removal  of  unnecessary  array  bounds 
checking. 

An  important  aspect  of  the  implementation  is  “poly-variance”  (the  analysis  analogue  of 
polymorphism).  That  is,  the  implementation  provides  a  mechanism  to  construct  different  “ver¬ 
sions”  of  functions.  In  essence  this  is  done  by  constraint  duplication.  However,  for  efficiency 
reasons,  we  wish  to  avoid  multiple  passes  over  the  input  lambda  expression,  and  instead  we 
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first  convert  the  lambda  expression  into  a  compact  internal  format,  from  which  multiple  copies 
of  constraints  can  be  rapidly  generated.  A  key  aspect  of  polyvariance  is  how  to  control  the 
generation  of  different  versions  of  functions.  One  approach  is  to  use  the  type  information  of  a 
program  (e.g.  if  a  function  is  polymorphic,  then  it  is  likely  to  be  useful  to  treat  it  as  a  poly-variant 
function).  However,  a  goal  of  our  implementation  was  to  provide  a  generic  analysis  tool  for 
functional  programs,  and  so  we  did  not  want  to  commit  to  a  typed  language.  Instead  we  chose 
a  scheme  in  which  the  program  is  analyzed  twice  -  the  first  pass  is  a  “mono-variant”  analysis, 
and  the  second  pass  uses  information  from  the  first  to  control  a  poly-variant  analysis. 

The  following  table  presents  some  preliminary  empirics  for  the  implementation.  We  use 
three  programs.  The  first  program  is  the  intmap  structure  from  the  SML-NJ  compiler,  which 
implements  a  mapping  from  integers  to  integers.  The  second  models  the  game  life,  and  is 
written  in  an  applicative  (rather  than  imperative)  style.  The  third  is  the  lexer  generator  from 
the  ml-Iex/ml-yacc  collection.  All  times  are  in  seconds  on  an  PMAX  5000/200  with  64M  and 
running  Mach.  The  second  column  of  the  table  gives  the  number  of  “equations”  generated10 
(this  excludes  constraints  that  are  solved  on-the-fly).  The  third  column  is  a  crude  estimate  of  the 
space  used  to  store  and  manipulate  the  constraints.  Phase  I  is  the  mono-variant  analysis.  Phase 
Q  is  the  poly-variant  analysis  (which  uses  information  from  phase  I). 


program 

time(secs) 

equations 

space(MB) 

intmap 
(105  lines) 

phase  I 

0.35 

1360 

0.08 

phase  II 

0.37 

1580 

0.35 

life 

(150  lines) 

phase  I 

0.86 

1925 

0.18 

phase  n 

3.0 

13769 

1.7 

lexgen 
(1170  lines) 

phase  I 

3.4 

6504 

1.6 

phase  II 

3.7 

9181 

1.3 

We  expect  substantial  improvement  in  the  running  time  and  space  of  poly-variant  analysis 
as  the  control  of  constraint  duplication  in  further  developed.  Note  that  the  space  requirements 
of  the  poly-variant  analysis  are  sometimes  less  than  those  for  the  mono-variant  analysis.  This 
is  because  the  mono-variant  analysis  effectively  folds  different  uses  of  a  function  together,  and 
although  this  results  in  fewer  set  variables,  it  may  substantially  increase  the  number  of  constraints 
per  variable  (in  the  final  explicit  form  constraints). 


6  Conclusion 


Starting  with  the  simple  intuition  of  treating  program  variables  as  sets,  we  have  developed  a 
powerful,  general  and  flexible  analysis  for  higher-order  call-by-value  functional  languages.  The 
contributions  of  the  paper  lie  in  two  areas.  First,  in  the  very  direct  and  appealing  connection 
between  a  program’s  set  based  appuw-u;'?*’  .n  (which  is  what  our  algorithm  computes),  and  its 

l0The  implementation  collects  all  constraint*  with  the  same  left-hand-side  variable  together,  and  the  resulting  object  is  effectively 
and  equation. 


underlying  operational  semantics.  Second,  in  the  presentation  of  an  algorithm  (and  implemen¬ 
tation  thereof)  which  combines  (a)  an  accurate  treatment  of  data-structures,  (b)  modeling  of 
side-effecting  operations  and  (c)  practicality. 
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Appendix  1 :  Construction  of  Set  Constraints 


2  >x:  (*r,  {}) 

_ 2  >  :  ('*i>  Ci)  2  t>  €2  :  (X2,  C2) _ 

2  t>  t\  e2  :  (y,  {y  2  apply(y',X 2),  y'  3  iJnonempty(2,Xi)}uCiUCi) 

_ -Z  ^  ei  •*  *  ~  t..n _ 

2  t>  c(e,,...,en):(y,  {y  3  c(Xu... ,  *n)}  U  C,  U ...  UCn) 

_ Xx  >  e  :  (*,  C) _ 

2  t>  Xx.e  :  ( y ,  {y  3  Ax.e,  ran(Ax.e)  3  A'}  UC) 

2  t>  e i  :  (Ai,  C\)  2  t>  ei :  (A2,  C2)  2  t>  e3 :  (A3,  C3) 

2  >  ctwe(ej, c(xl5. . .  ,xn)  =>  e2>lt  =>•  63)  :  (y,  C  U  Ci  U  C2  U C3) 

where C  =  {y  2  aMe(y',c(A't|,...)A’t#)  =>  22,  =>  23),  y*  3  ifnonempty(Z, Ai)} 

_ *  ■>«:(*,  C) _ 

2  >  fixx.e  :  (A'I,{<%'*  3  i/honemp<y(2,  A)}  UC) 

Figure  5:  Construction  of  Set  Constraints  (complete  version) 

Figure  5  presents  the  complete  details  of  the  constructions  of  set  constraints  for  a  term.  The 
main  difference  between  Figure  5  and  Figure  3  is  that  the  relation  2  >  e  :  ( se,C )  recursively 
passes  down  a  set  variable  which  is  empty  if  the  expression  under  consideration  is  never  called, 
and  is  non-empty  otherwise.  The  key  property  of  the  relation  2  t>  e  :  (se,C)  is  that  if  2  is 
empty  then  C  is  vacuously  true,  and  if  2  is  nonempty,  then  se  and  C  are  equivalent  to  those 
constructed  using  the  simpler  deductive  system  in  Figure  3.  We  now  define  <SC(e0)  as  follows: 
if  2  is  a  new  set  variable  and  2  >  e  :  (X,C),  then  SC(eo)  is  the  pair  (A,  {2  3  e)  U  C )  where 
e  is  some  arbitrary  sc-value.  Note  that  all  sc-values  are  set  expressions  and  that  the  choice  of 
e  is  arbitrary  -  its  only  purpose  is  to  force  the  variable  2  to  be  nonempty,  since  otherwise  the 
constraints  C  would  be  vacuously  true. 
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